notebooks/Unit 9 - Model Optimization/optimum.ipynb

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Optimum\n", "\n", "This notebook demonstrate on how to use the Optimum to perform quantization of models hosted on the Hugging Face Hub using the ONNX Runtime quantization tool." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup Optimum\n", "\n", "First, let's install optimum and import required modules." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install optimum" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from optimum.onnxruntime import ORTQuantizer, ORTModelForImageClassification\n", "from functools import partial\n", "from optimum.onnxruntime.configuration import AutoQuantizationConfig, AutoCalibrationConfig\n", "from onnxruntime.quantization import QuantType\n", "from transformers import AutoFeatureExtractor\n", "from PIL import Image\n", "import requests" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load ONNX Runtime Model\n", "\n", "Load the ONNX Runtime model from Huggingface Hub. We will be using the Vision Transformer `vit-base-patch16-224`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "preprocessor = AutoFeatureExtractor.from_pretrained(\"optimum/vit-base-patch16-224\")\n", "model = ORTModelForImageClassification.from_pretrained(\"optimum/vit-base-patch16-224\")\n", "model.save_pretrained(\"models/vit-base-patch16-224\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dynamic Quantization\n", "\n", "Similar to ONNX Runtime quantization, dynamic quantization calculates the parameters to be quantized for activations dynamically which increase the accuracy of the model but increase the latency as well.\n", "\n", "To perform dynamic quantization, first create quantizer using `ORTQuantizer` class and define the configuration using `AutoQuantizationConfig` before calling `quantize()` method to quantize the model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "quantizer = ORTQuantizer.from_pretrained(model)\n", "dqconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=False)\n", "dqconfig.weights_dtype = QuantType.QUInt8\n", "model_quantized_path = quantizer.quantize(\n", " save_dir=\"models/vit-base-patch16-224-quantized-dynamic\",\n", " quantization_config=dqconfig,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Check Model Size\n", "\n", "Compare the size of the original model and the quantized model.\n", "\n", "Size of original model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%ls -lh models/vit-base-patch16-224" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Size of quantized model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%ls -lh models/vit-base-patch16-224-quantized-dynamic" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Check Model Result\n", "\n", "Next, we will validate the quantized model by comparing the result of the original model and the quantized model.\n", "We created a function to perform inference when given model, processor and image then return the classification result based on the ImageNet label" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "def infer_ImageNet(classification_model, processor, image):\n", " inputs = processor(images=image, return_tensors=\"pt\")\n", " outputs = classification_model(**inputs)\n", " logits = outputs.logits\n", " predicted_class_idx = logits.argmax(-1).item()\n", " return classification_model.config.id2label[predicted_class_idx]\n", "\n", "# Get sample image\n", "url = 'http://images.cocodataset.org/val2017/000000039769.jpg'\n", "image = Image.open(requests.get(url, stream=True).raw)\n", "\n", "res = infer_ImageNet(model, preprocessor, image)\n", "print(\"Original model prediction:\", res)\n", "\n", "quantized_model = ORTModelForImageClassification.from_pretrained(model_quantized_path)\n", "dq_res = infer_ImageNet(quantized_model, preprocessor, image)\n", "print(\"Quantized model prediction:\", dq_res)\n", "display(image)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Static Quantization\n", "\n", "For static quantization, similar to ONNX Runtime static quantization, parameters are quantized first using the calibration dataset. This method is faster than dynamic quantization but the accuracy is lower. \n", "\n", "When using Optimum, claibration dataset can be created using `quantizer.get_calibration_dataset(()` method which take any datsets from HuggingFace Hub or local folder. Once calibration dataset is created and calibration configuration defined using `AutoCalibrationConfig.minmax()`, perform calibration by calling `quantizer.fit()` method and then quantize the model using `quantizer.quantize()` method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "quantizer = ORTQuantizer.from_pretrained(model)\n", "static_qconfig = AutoQuantizationConfig.arm64(is_static=True, per_channel=False)\n", "\n", "# Create the calibration dataset\n", "def preprocess_fn(ex, processor):\n", " return processor(ex[\"image\"])\n", "\n", "calibration_dataset = quantizer.get_calibration_dataset(\n", " \"zh-plus/tiny-imagenet\",\n", " preprocess_function=partial(preprocess_fn, processor=preprocessor),\n", " num_samples=50,\n", " dataset_split=\"train\",\n", ")\n", "\n", "# Create the calibration configuration containing the parameters related to calibration.\n", "calibration_config = AutoCalibrationConfig.minmax(calibration_dataset)\n", "\n", "# Perform the calibration step: computes the activations quantization ranges\n", "ranges = quantizer.fit(\n", " dataset=calibration_dataset,\n", " calibration_config=calibration_config,\n", " operators_to_quantize=static_qconfig.operators_to_quantize,\n", ")\n", "\n", "# Apply static quantization on the model\n", "model_quantized_path_static = quantizer.quantize(\n", " save_dir=\"models/vit-base-patch16-224-quantized-static\",\n", " calibration_tensors_range=ranges,\n", " quantization_config=static_qconfig,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Check Model Size\n", "\n", "Again, check the size of the quantized model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%ls -lh models/vit-base-patch16-224-quantized-static" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Validate Quantized Model\n", "\n", "Finally, validate the quantized model by comparing the result of the original model and the quantized model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "static_quantized_model = ORTModelForImageClassification.from_pretrained(model_quantized_path_static)\n", "sq_res = infer_ImageNet(static_quantized_model, preprocessor, image)\n", "print(\"Quantized model prediction (static):\", sq_res)" ] } ], "metadata": { "kernelspec": { "display_name": "model_optimization", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" } }, "nbformat": 4, "nbformat_minor": 2 }

notebooks/Unit 9 - Model Optimization/optimum.ipynb (273 lines of code) (raw):